Gaussian Copula Precision Estimation with Missing Values
نویسندگان
چکیده
We consider the problem of estimating sparse precision matrix of Gaussian copula distributions using samples with missing values in high dimensions. Existing approaches, primarily designed for Gaussian distributions, suggest using plugin estimators by disregarding the missing values. In this paper, we propose double plugin Gaussian (DoPinG) copula estimators to estimate the sparse precision matrix corresponding to non-paranormal distributions. DoPinG uses two plugin procedures and consists of three steps: (1) estimate nonparametric correlations based on observed values, including Kendall’s tau and Spearman’s rho; (2) estimate the nonparanormal correlation matrix; (3) plug into existing sparse precision estimators. We prove that DoPinG copula estimators consistently estimate the non-paranormal correlation matrix at a rate of O( 1 (1−δ) √ log p n ), where δ is the probability of missing values. We provide experimental results to illustrate the effect of sample size and percentage of missing data on the model performance. Experimental results show that DoPinG is significantly better than estimators like mGlasso, which are primarily designed for Gaussian data.
منابع مشابه
EM algorithm in Gaussian copula with missing data
Rank-based correlation is widely used to measure dependence between variables when their marginal distributions are skewed. Estimation of such correlation is challenged by both the presence ofmissing data and the need for adjusting for confounding factors. In this paper, we consider a unified framework of Gaussian copula regression that enables us to estimate either Pearson correlation or rank-...
متن کاملSpatial Copula Model for Imputing Traffic Flow Data from Remote Microwave Sensors
Issues of missing data have become increasingly serious with the rapid increase in usage of traffic sensors. Analyses of the Beijing ring expressway have showed that up to 50% of microwave sensors pose missing values. The imputation of missing traffic data must be urgently solved although a precise solution that cannot be easily achieved due to the significant number of missing portions. In thi...
متن کاملA Bayesian Approach to Inference and Prediction for Spatially Correlated Count Data Based on Gaussian Copula Model
Gaussian Copula has been successfully applied in spatially correlated count data due to its ability to completely model the high-dimensional dependence. In this article, we develop a Bayesian method to fulfill both parameter estimation and spatial prediction for spatially correlated count data set. A MCMC scheme (MetropolisCHastings Algorithm plus rejection sampling) is adopted to iteratively u...
متن کاملSpatial Interpolation Using Copula for non-Gaussian Modeling of Rainfall Data
‎One of the most useful tools for handling multivariate distributions of dependent variables in terms of their marginal distribution is a copula function‎. ‎The copula families capture a fair amount of attention due to their applicability and flexibility in describing the non-Gaussian spatial dependent data‎. ‎The particular properties of the spatial copula are rarely ...
متن کاملMulti-task Sparse Structure Learning with Gaussian Copula Models
Multi-task learning (MTL) aims to improve generalization performance by learning multiple related tasks simultaneously. While sometimes the underlying task relationship structure is known, often the structure needs to be estimated from data at hand. In this paper, we present a novel family of models for MTL, applicable to regression and classification problems, capable of learning the structure...
متن کامل